WRANGLERS
Photo by Nikko Macaspac on Unsplash
Man is not the sum of what he has already, but rather the sum of what he does not yet have, of what he could have….
— Jean-Paul Sartre
We will group this data into different parts, and apply function/s to each of them, such as sum, mean, median, max and min. Since we have some missing valueS, let’s also replace “NA” data with 0 first (we will revisit this pattern later in the course).
df <- read_csv('./archetypes/missing-migrants/MissingMigrants-Global-2020-10-18T06-37-33.csv')
df <- df %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0))
df
First, lets create a simle Group by on “Reported Year” and calculate the sum of the “Total Dead and Missing” persons.
df_1 <- df %>%
group_by(`Reported Year`) %>%
summarise(sum_number=sum(`Total Dead and Missing`))
df_1
Let’s now group by “Reported Year” and “Reported Month”, calculating the sum of the “Total Dead and Missing”.
df_2 <- df %>%
group_by(`Reported Year`, `Reported Month`) %>%
summarise(sum_number=sum(`Total Dead and Missing`))
df_2
Notice that the data is presented vertically, with only 3 columns, and each row representing one observation (here the sum for a specific year and month. This is called the “Long data format”. This is the best format to run the chart below.
df_2$`Reported Month` <- factor(df_2$`Reported Month`, levels=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
df_2$`Reported Year` <- factor(df_2$`Reported Year`, levels=c('2014','2015','2016','2017','2018','2019','2020'))
year_palette <- c("2014" = "#E9EAF6", "2015" = "#C5C8E0", "2016" = "#959DCA", "2017" = "#4764A8", "2018" = "#6F7DB6",
"2019" = "#33B5B3", "2020" = "#C5C8E0")
v1 <- ggplot(df_2,aes(x=`Reported Month`, y=sum_number, fill=`Reported Year`)) +
geom_hline(yintercept = 0, size=0.1, color = "grey") +
geom_hline(yintercept = 500, size=0.1, color = "grey") +
geom_hline(yintercept = 1000, size=0.1, color = "grey") +
geom_hline(yintercept = 1500, size=0.1, color = "grey") +
geom_bar(stat="identity", position=position_dodge()) +
scale_fill_manual(values=year_palette, name = "YEAR") +
geom_text(aes(label = sum_number), hjust=-0.1, size = 3, angle=90,
position = position_dodge(0.9)) +
theme_tufte(base_size = 15) +
theme(
panel.background = element_blank(),
plot.title = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks = element_blank()
#plot.margin = unit(c(1, 5, 1, 1), "lines")
) +
theme(legend.position="top") +
guides(colour = guide_legend(nrow = 1))
girafe(ggobj = v1, width_svg = 16, height_svg = 9, options =
list(opts_sizing(rescale = TRUE, width = 1.0)))
Although the “Long format” is needed to use with visual library, it is not really user friendly. Let’s use the “Wide format” to make it easier for users to review the data. We can do this by
The above “Long data format” may be good for running a chart, but it is not the most human-friendly presentation. Let’s see how we can transform this.
df_3 <- df_2 %>%
pivot_wider(names_from = `Reported Month`, values_from = sum_number) %>%
relocate(`Reported Year`,Jan , Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)
df_3
Let’s unpivot this data to return to “Long data format”.
df_4 <- df_3 %>%
pivot_longer(!`Reported Year`,names_to = "MONTH", values_to = "TOTAL_DEAD")
df_4
IOM, Missing Migrants, GO
@misc{missingmigrants_2001_missing,
author = {MissingMigrants},
month = {06},
title = {Missing Migrants Project},
url = {https://missingmigrants.iom.int/},
urldate = {2021-06-08},
year = {2001},
organization = {Iom.int}
}